[Feature] Accelerate the comparison for batched#991
Conversation
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthrough
ChangesBatched futures skip parameter refactoring
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/executorlib/standalone/batched.py (1)
19-25:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftFix
batched_futures: value-dedupedskip_setcan overestimaten_expectedand leave later batches permanently unresolved when duplicate result values occur.In
src/executorlib/standalone/batched.py,n_expected = min(n, len(lst) - len(skip_set)), but eligibility is computed viav.result() not in skip_set(value-based). Insrc/executorlib/task_scheduler/interactive/dependency.py,skip_setis built as a deduplicated set of prior batch results, so duplicates inlstthat were already consumed by an earlier batch still remove all remaining occurrences by value, whilen_expectedonly subtracts once per distinct value.When there aren’t enough remaining non-skipped values to fill the batch,
batched_futureskeeps returning[], and_update_waiting_taskkeeps the batch waiting forever.Concrete example (single value repeated in the consumed batch):
lstresults:[1, 1, 1, 2],n=3- Batch 1 returns
[1, 1, 1]- Batch 2:
skip_set={1}, son_expected=min(3, 4-1)=3, but only the single future with result2is eligible → returns[]indefinitely.Add tests with duplicate result values (and/or change the algorithm to track already-assigned counts or per-future identity rather than a value-deduped set).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/executorlib/standalone/batched.py` around lines 19 - 25, batched_futures overestimates n_expected because skip_set is value-deduped; change batched_futures (and its use of skip_set) to account for duplicate result values by tracking counts or per-future identity: either accept a skip_counts Counter (value -> number skipped) or build a skip_futures set of already-assigned futures, then compute n_expected as min(n, count of futures in lst not excluded by the skip tracking) and when collecting done results decrement the skip count or mark that specific future as skipped; update references to skip_set, n_expected, lst and done_lst accordingly and add tests for duplicate-result scenarios.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/executorlib/task_scheduler/interactive/dependency.py`:
- Around line 349-353: The current batching builds a value-based skip_set by
flattening f.result() into a set which fails for unhashable batch outputs;
change to an identity/position-based skip key so membership checks don't require
hashing the full result. In
src/executorlib/task_scheduler/interactive/dependency.py replace the flattened
value set with a set of identity keys (e.g. {id(item) for f in
task_wait_dict["kwargs"]["skip_lst"] for item in f.result()} or a tuple of
(future_index, item_index) keys) and update the corresponding membership test in
src/executorlib/standalone/batched.py (the v.result() not in skip_set check) to
compare the same identity/position key (e.g. id(v.result()) or (v_future_index,
v_item_index)) instead of the raw value so unhashable batch items no longer
raise TypeError.
---
Outside diff comments:
In `@src/executorlib/standalone/batched.py`:
- Around line 19-25: batched_futures overestimates n_expected because skip_set
is value-deduped; change batched_futures (and its use of skip_set) to account
for duplicate result values by tracking counts or per-future identity: either
accept a skip_counts Counter (value -> number skipped) or build a skip_futures
set of already-assigned futures, then compute n_expected as min(n, count of
futures in lst not excluded by the skip tracking) and when collecting done
results decrement the skip count or mark that specific future as skipped; update
references to skip_set, n_expected, lst and done_lst accordingly and add tests
for duplicate-result scenarios.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9a5e9403-9861-440d-ac4c-96e84ef390c4
📒 Files selected for processing (2)
src/executorlib/standalone/batched.pysrc/executorlib/task_scheduler/interactive/dependency.py
|
@copilot fix the following two errors in the tests: |
... Fixed in the latest commit. Updated the tests to use |
|
Actionable comments posted: 0 |
|
Actionable comments posted: 0 |
Updated versioning system from vcs-versioning to setuptools-scm and modified version details.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #991 +/- ##
=======================================
Coverage 94.19% 94.19%
=======================================
Files 39 39
Lines 2103 2103
=======================================
Hits 1981 1981
Misses 122 122 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
for more information, see https://pre-commit.ci
Summary by CodeRabbit
Refactor
Tests